Risk prediction and interaction analysis using polygenic risk score of type 2 diabetes in a Korean population

Joint modelling of genetic and environmental risk factors can provide important information to predict the risk of type 2 diabetes (T2D). Therefore, to predict the genetic risk of T2D, we constructed a polygenic risk score (PRS) using genotype data of one Korean cohort, KARE (745 cases and 2549 controls), and the genome-wide association study summary statistics of Biobank Japan. We evaluated the performance of PRS in an independent Korean cohort, HEXA (5684 cases and 35,703 controls). Individuals with T2D had a significantly higher mean PRS than controls (0.492 vs. − 0.078, p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx 0$$\end{document}≈0). PRS predicted the risk of T2D with an AUC of 0.658 (95% CI 0.651–0.666). We also evaluated interaction between PRS and waist circumference (WC) in the HEXA cohort. PRS exhibited a significant sub-multiplicative interaction with WC (ORinteraction 0.991, 95% CI 0.987–0.995, pinteraction = 4.93 × 10–6) in T2D. The effect of WC on T2D decreased as PRS increased. The sex-specific analyses produced similar interaction results, revealing a decreased WC effect on T2D as the PRS increased. In conclusion, the risk of WC for T2D may differ depending on PRS and those with a high PRS might develop T2D with a lower WC threshold. Our findings are expected to improve risk prediction for T2D and facilitate the identification of individuals at an increased risk of T2D.


PRS-WC interaction
The results of the analyses of the main-effect-only model and the joint model are shown in Table 3.A larger WC was associated with an increased risk of T2D.We found a sub-multiplicative interaction between PRS and WC with respect to the risk of T2D (OR interaction 0.991, 95% CI 0.987-0.995,p interaction = 4.93 × 10 -6 ).WC ≥ 90 cm in men and WC ≥ 85 cm in women is defined as abdominal obesity 14 .The risk of T2D associated with PRS differed when stratified by WC, and weaker associations were observed among individuals with abdominal obesity (Supplementary Table S3).The effect size of the association between PRS and the risk of T2D in individuals those with and without abdominal obesity was OR 1.758 (95% CI 1.665-1.855,p ≈ 0) and OR 2.083 (95% CI 2.001-2.168,p ≈ 0), respectively.We found similar results, showing a significant sub-multiplicative interaction between PRS and WC in T2D development from the analyses for the KARE cohort (Supplementary Tables S3 and S4).However, no significant additive interaction was observed between abdominal obesity and dichotomized PRS, where the low and high genetic risk groups included individuals with a PRS less than the median PRS and with a PRS larger than or equal to the median PRS, respectively.In the corresponding analysis, there was a significant submultiplicative interaction between abdominal obesity and dichotomized PRS.The estimated relative excess risk due to interaction (RERI) was 0.023 (95% CI − 0.298 to 0.349), while OR interaction was 0.771 (95% CI 0.677-0.879).

Discrimination results
We examined the discrimination ability of the PRS stratified by abdominal obesity and observed that individuals with abdominal obesity were discriminated less by the PRS than individuals without abdominal obesity.The AUC of individuals with and without abdominal obesity was 0.630 (95% CI 0.616-0.643)and 0.679 (95% www.nature.com/scientificreports/CI 0.670-0.688),respectively.The results for the KARE cohort were similar to those for HEXA cohort (Supplementary Table S5).
We performed the prediction model using WC only and the WC was found to have an AUC of 0.694.Incorporating PRS with WC improved the AUC to 0.750.We evaluated the model incorporating age, sex, BMI, and WC, which showed an AUC of 0.747.The model with PRS added to age, sex, BMI, and WC had higher AUC (AUC = 0.794), than the model with age, sex, BMI, and WC.The results are provided in Supplementary Table S6.

Sex-specific analysis
We performed interaction analyses and discrimination evaluations stratified by sex (Table 3 and Supplementary Table S5).Similar to the results of combined analysis, for men, WC was associated with an increased risk of T2D, PRS was associated with an increased risk of T2D, significant sub-multiplicative interaction between PRS and WC existed, and the discriminatory performance of PRS for individuals without abdominal obesity was better than that in individuals with abdominal obesity.For women, the results were similar to those for men.

Discussion
In this study, we constructed a PRS for T2D based on 1004 single nucleotide polymorphisms (SNPs) in 3294 subjects in the KARE cohort using GWAS summary statistics of Biobank Japan 7 and evaluated the PRS in 41,387 subjects from the HEXA cohort.We found that one SD increase in PRS was significantly associated with a 1.964-fold increased risk of T2D.The diagnostic accuracy of the PRS based on the AUC was 0.658.When PRS was divided into quartiles, individuals in the highest-risk group had a 5.132-fold increased risk compared to those in the lowest risk group.There was a significant multiplicative interaction between PRS and WC and PRS had a greater effect in individuals without abdominal obesity than in those with abdominal obesity.Overall, our study shows the potential utility of the PRS to stratify high-risk individuals with T2D for requiring preventive measures in the Korean population.
As our study showed, PRS has a great potential to identify and stratify individuals with risk of diseases, predict the risk of disease, and contribute to precision medicine.Because of the importance of PRS, many methods for computing PRS have been developed.The methods include clumping-and-thresholding (CT) (PRSice2 15 ), Bayesian approaches (LDpred 16 or LDpred2 17 , PRS-CS 18 , SBayesR 19 ), and a penalized regression method (Lassosum 20 ).CT relies on clumping and p-value thresholding to select SNPs for PRS construction.To infer the posterior mean effects of SNPs in Bayesian methods, LDpred 16 and LDpred2 17 , assign a point-normal prior to SNP effect sizes, PRS-CS 18 uses a continuous shrinkage prior on SNP effect sizes, and SBayesR 19 utilizes a prior that consists of a point mass at zero along with a mixture of normal distributions.Lassosum 20 uses lasso to select SNPs and construct PRS from GWAS summary statistics.Our PRS model constructed by one of the most commonly used PRS methods, PRSice2 15 , showed a significant association with T2D.Table 3. Results under the main-effect-only models and under the joint effect model incorporating interaction between PRS and WC for combined and sex-stratified analyses.The main-effect-only model for WC included covariates and WC.The main-effect-only model for PRS included covariates and PRS.The joint effect model incorporating interaction included covariates, WC, PRS, and the interaction term between PRS and WC.For the combined analysis, covariates were age, BMI, and sex.For sex-stratified analyses, the covariates were age and BMI.The WC was measured three times, and the average value was used.OR odds ratio, CI confidence interval, PRS polygenic risk score, WC waist circumference, BMI body mass index.www.nature.com/scientificreports/

Model
We showed that PRS was a strong risk factor for T2D, with an OR of 1.964.However, the diagnostic accuracy of PRS was only moderate, with an AUC of 0.658.For assessing diagnostic accuracy, an AUC above 0.70 is considered acceptable.It should be taken into account that no other clinical or environmental risk factors were included in the risk model.The PRS-only model would be similar to predicting someone's risk of T2D only with PRS without any clinical information.It is fair to speculate that the AUC would have increased when other clinical risk factors, including age, sex, BMI, and WC, were included in the model.
There are arguments that adding PRS to clinical risk factors does not improve AUC.This is based on the fact that clinical risk model, including fasting glucose and hemoglobin A1c (HbA1c), already achieves AUC that reaches 0.90 21 .The PRS is considered to aid risk stratification and, therefore, identify high-risk individuals 22 .Our results showed that the OR for T2D in the highest PRS quartile was 5.132-fold higher than that in the lowest PRS quartile.Similarly, individuals with PRS in top 5% had 4.192-fold increased risk compared to those with PRS in the remaining 95%.As much as 34% individuals with the PRS in the top 5% had T2D.These individuals should be targeted for preventive measures and earlier screening.
We investigated the multiplicative and additive interactions between PRS and WC, a non-genetic risk factor for T2D.WC reflects abdominal obesity and is a well-known risk factor for T2D.As expected, WC was a significant predictor of T2D in both the cohorts.Although we did not find evidence of a significant additive interaction between abdominal obesity and PRS status, we observed a significant negative multiplicative interaction between PRS and WC.The genetic effect estimate of PRS was larger in individuals without abdominal obesity (smaller WC) than those with abdominal obesity (larger WC).Many of the T2D genetic risk loci are associated with decreased beta-cell function 23 .It is speculated that individuals with abdominal obesity have a higher environmental risk of T2D and the relative effect of PRS would be modest.However, for those without abdominal obesity and lower environmental risk of T2D, genetic risk as reflected in PRS would exert larger effect.This finding also suggests that non-obese individuals with T2D may have a higher genetic risk of T2D.In addition, when stratified by abdominal obesity, the discriminatory performance of PRS in terms of AUC increased for individuals without abdominal obesity compared to that for individuals with abdominal obesity.That finding implies that PRS is more important risk predictor of T2D in individuals without abdominal obesity.
The strengths of our study include the following.First, we used GWAS summary statistics derived from Japanese whose ancestry are relatively close.Second, we used two large independent cohorts to train and validate the PRS model.Lastly, we investigated the interaction between PRS and WC, a key environmental factor of T2D.However, this study has certain limitations.The main analysis in this study was based on case-control logistic regression as there was insufficient longitudinal follow-up information in the HEXA cohort.It would have been more interesting if we had been able to predict incident T2D cases using PRS.Also, we did not compare PRS using GWAS summary statistics with and without inclusion of BMI as a covariate.
In summary, this study suggests that PRS can be utilized as a screening strategy for genetically high-risk T2D group.In addition, there is a sub-multiplicative interaction between WC and PRS in T2D and these findings provide the joint etiology of PRS and WC in T2D.Future studies with larger sample size are needed to replicate our findings and examine the characteristics at the extreme end of the PRS distribution in terms of the interaction effect between WC and PRS.

Study population
As a two-stage study, we used two Korean cohorts as a training set to develop genome-wide PRS in the first stage and a test set to evaluate the effectiveness of T2D PRS and perform interaction analysis in the second stage; KARE and HEXA cohorts, respectively.Both cohorts are currently assessed as part of the Korean Genome and Epidemiology Study 24 .We performed the analysis using the data of individuals who had complete information on genetic variations, phenotype, WC, and covariates such as age, sex, and BMI.We used 3294 individuals (745 cases and 2549 controls) from the KARE cohort (age: 40-69 years), which were collected in 2001 from residents in the urban community of Ansan City and the rural community of Anseong City.From the HEXA, which recruited participants aged 40-79 years, 41,387 individuals (5684 cases and 35,703 controls) were used.All studies were approved by the Institutional Review Board of Sookmyung Women's University.The baseline characteristics of the study population in each cohort are summarized in Table 4.The data are publicly available by submission of the application form to Korea Disease Control and Prevention Agency (KDCA) (https:// bioba nk.nih.go.kr).

T2D definition
T2D cases were defined if any one of the following was present: (1) fasting plasma glucose (FPG) ≥ 126 mg/dL, (2) HbA1c level ≥ 6.5%, (3) use of anti-diabetic medications, or (4) history of diagnosed diabetes.In the KARE study, participants had data on 2-h postprandial blood glucose level measurements and the inclusion criteria for T2D cases included 2-h postprandial blood glucose level ≥ 200 mg/dL.Similarly, prediabetes and nondiabetic healthy subjects were defined sequentially and the criteria are presented in Supplementary Table S7.Nondiabetic controls were defined as a subject such that FPG < 100 mg/dL, no medical history of diagnosed T2D, and 2-h postprandial blood glucose level < 140 mg/dL.

Genotyping
Genomic DNA was extracted from the peripheral blood samples of participants.Genotyping was conducted using Korea Biobank arrays (KoreanChip), which was designed by the Center for Genome Science at the Korea National Institutes of Health.The KoreanChip contains approximately 833,535 SNPs that are specific to the Korean population.The locations of the genes were assigned through the National Center for Biotechnology Information Human Gene Build 37 (hg19).SHAPEIT v2-IMPUTE v2 was used for imputation analysis of www.nature.com/scientificreports/genotype data with 1000 Genomes Phase 3 data as a reference panel 24 .Detailed information on the KoreanChip has been reported in a previously published article 25 .

Polygenic risk scores
PRSs were derived for KARE samples using the imputed genotype data of KARE samples and GWAS summary statistics of Biobank Japan 7 as weights by PRSice2 15 software.The PRS of an individual i is defined as follows: where X ij is the dosage, expected number of alternative alleles in the j-th SNP for an individual i, and M is the number of SNPs computed in PRS.W j is the weight of the j-th SNP, which is the log OR of its association with T2D obtained from GWAS summary statistics of the discovery set.We used Biobank Japan 7 as the discovery set.PRSs were calculated using P-value thresholds of ≤ 5 × 10 −8 , ≤ 5.005 × 10 −5 , ≤ 1.0005 × 10 −4 ,… , ≤ 0.5 in steps of 5 × 10 −5 , and the full model including all SNPs ( ≤ 1 ) with LD pruning parameters of r 2 = 0.1 over 1000- kb windows.The exclusion criteria for SNPs for both Biobank Japan and KARE, which were used for constructing PRS, were as follows: imputation info score < 0.9, minor allele frequency < 0.01 for the discovery and target sets, which correspond to Biobank Japan and KARE, respectively.The explained variance (Nagelkerke pseudo-R 2 ) was derived from a logistic regression model in which PRS was a predictor while controlling for the covariates, compared to a logistic regression model with covariates only.The PRS achieving the maximal explained variance was selected.In our analysis, age, BMI, and sex were considered as covariates and the selected PRS consisted of the 1004 SNPs with P-value threshold of 0.0003 (Supplementary Table S8).To evaluate the PRS constructed from KARE, PRSs were computed using the selected 1004 SNPs for HEXA samples by multiplying the dosage of each SNP by the log of OR from GWAS summary statistics of Biobank Japan 7 .The PRS scores standardized to a mean of 0 and a variance of 1 were used for all analyses.

Interaction analysis
We investigated multiplicative and additive interactions.Multiplicative interaction was evaluated by performing a likelihood ratio test from the fitting of the logistic regression models both with and without the interaction term.Additive interaction between abdominal obesity and dichotomized PRS was assessed by RERI.Particularly, we dichotomized PRS at the median of the PRS and compared individuals above or equal to the median to those below the median.RERI is expressed using the following formula: RERI = RR 11 − RR 01 − RR 10 + 1, where RR is the relative risk; the reference group consisted of individuals with lower 50% of the T2D genetic risk and without abdominal obesity; RR 01 represented individuals with lower 50% of T2D genetic risk and with abdominal obesity; RR 10 represented individuals with upper 50% of T2D genetic risk and without abdominal obesity; and RR 11 represented individuals with upper 50% of T2D genetic risk and with abdominal obesity. https://doi.org/10.1038/s41598-024-55945-2

Table 1 .
Association between polygenic risk score and risk of T2D. a OR from logistic regression models were adjusted for age, sex, and BMI.Q quartile, OR odds ratio, CI confidence interval, PRS polygenic risk score, BMI body mass index, T2D type 2 diabetes.

Table 2 .
Risk in high polygenic risk score groups for T2D development.a OR from logistic regression models were adjusted for age, sex, and BMI.OR odds ratio, CI confidence interval, PRS polygenic risk score, BMI body mass index, T2D type 2 diabetes.

Table 4 .
Baseline characteristics of the HEXA and KARE cohorts.The mean and standard deviation are shown for continuous variables, and counts and proportions are shown for categorical variables.Ansan and Anseong are the urban and rural communities, respectively.For the KARE cohort, WC was measured three times, and the average value was used.BMI body mass index, WC waist circumference.